我们提出了一个端到端的可训练框架,通过仅通过查看其条目的一小部分来处理大规模的视觉数据张量。我们的方法将神经网络编码器与张振火车分解组合以学习低级潜在编码,耦合与交叉近似(CA)耦合,以通过原始样本的子集学习表示。 CA是一种自适应采样算法,它是原产的张量分解,并避免明确地使用全高分辨率数据。相反,它主动选择我们获取核心和按需获取的本地代表性样本。所需数量的样本仅使用输入的大小对数进行对数。我们网络中的张量的隐式表示,可以处理在其未压缩形式中不能以其他方式丢失的大网格。所提出的方法对于大规模的多维网格数据(例如,3D断层扫描)以及需要在大型接收领域(例如,预测整个器官的医学条件)的任务,特别适用于需要上下文的任务。代码可在https://github.com/aelphy/c-pic中获得。
translated by 谷歌翻译
贝叶斯优化(BO)是机器学习算法的封锁率优化(HPO)广泛流行的方法。在其核心,Bo迭代地评估有前途的配置,直到用户定义的预算(例如挂钟时间或迭代次数)耗尽。虽然在调整大量后的最终性能取决于提供的预算,但很难提前预先指定最佳价值。在这项工作中,我们为BO提出了一种有效而直观的终止标准,如果它足够接近全球Optima,则会自动停止程序。在广泛的实际HPO问题中,我们表明,与来自文献的现有基线相比,我们的终止标准实现了更好的测试性能,例如在改进概率下降到固定阈值以下时停止。我们还提供了证据表明,与我们的方法相比,这些基线对其自身的Quand参数的选择非常敏感。此外,我们发现在HPO的背景下可能会出现过度装备,这可以在文献中可以说是一个忽视的问题,并表明我们的终止标准减轻了小型和大型数据集的这种现象。
translated by 谷歌翻译
As modern data pipelines continue to collect, produce, and store a variety of data formats, extracting and combining value from traditional and context-rich sources such as strings, text, video, audio, and logs becomes a manual process where such formats are unsuitable for RDBMS. To tap into the dark data, domain experts analyze and extract insights and integrate them into the data repositories. This process can involve out-of-DBMS, ad-hoc analysis, and processing resulting in ETL, engineering effort, and suboptimal performance. While AI systems based on ML models can automate the analysis process, they often further generate context-rich answers. Using multiple sources of truth, for either training the models or in the form of knowledge bases, further exacerbates the problem of consolidating the data of interest. We envision an analytical engine co-optimized with components that enable context-rich analysis. Firstly, as the data from different sources or resulting from model answering cannot be cleaned ahead of time, we propose using online data integration via model-assisted similarity operations. Secondly, we aim for a holistic pipeline cost- and rule-based optimization across relational and model-based operators. Thirdly, with increasingly heterogeneous hardware and equally heterogeneous workloads ranging from traditional relational analytics to generative model inference, we envision a system that just-in-time adapts to the complex analytical query requirements. To solve increasingly complex analytical problems, ML offers attractive solutions that must be combined with traditional analytical processing and benefit from decades of database community research to achieve scalability and performance effortless for the end user.
translated by 谷歌翻译
Text-based personality computing (TPC) has gained many research interests in NLP. In this paper, we describe 15 challenges that we consider deserving the attention of the research community. These challenges are organized by the following topics: personality taxonomies, measurement quality, datasets, performance evaluation, modelling choices, as well as ethics and fairness. When addressing each challenge, not only do we combine perspectives from both NLP and social sciences, but also offer concrete suggestions towards more valid and reliable TPC research.
translated by 谷歌翻译
Stance detection (SD) can be considered a special case of textual entailment recognition (TER), a generic natural language task. Modelling SD as TER may offer benefits like more training data and a more general learning scheme. In this paper, we present an initial empirical analysis of this approach. We apply it to a difficult but relevant test case where no existing labelled SD dataset is available, because this is where modelling SD as TER may be especially helpful. We also leverage measurement knowledge from social sciences to improve model performance. We discuss our findings and suggest future research directions.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
运行时验证(RV)有可能使安全关键系统的安全操作太复杂而无法正式验证,例如机器人操作系统2(ROS2)应用程序。编写正确的监视器本身可能很复杂,监视子系统中的错误威胁着整个任务。本文概述了一种正式的方法,该方法是根据用结构化的自然语言编写的要求为自动驾驶机器人生成运行时监视器的。我们的方法通过OGMA集成工具将正式需求启发工具(FRET)与Copilot(运行时验证框架)集成在一起。 FRET用于用明确的语义指定需求,然后将其自动转化为时间逻辑公式。 OGMA从FRET输出中生成监视规格,该规范已编译为硬实时C99。为了促进ROS2中的显示器的集成,我们已经扩展了OGMA,以生成定义监视节点的ROS2软件包,该节点在新数据可用时运行监视器,并发布任何违规结果。我们方法的目的是将生成的ROS2软件包视为黑匣子,并以最小的努力将它们集成到更大的ROS2系统中。
translated by 谷歌翻译
解剖跟踪数据提供了有关脑电路的详细信息,这些信息对于解决扩散MRI拖拉术中的某些常见误差必不可少。然而,由于截断,噪声和伪影的存在以及强度/对比度变化,因此在跟踪数据上对纤维束的自动检测具有挑战性。在这项工作中,我们提出了一种具有自律损失函数的深度学习方法,该方法将基于解剖的损失函数构成了基于解剖学的约束,以准确地分割了猕猴大脑的示踪剂切片上的纤维束。同样,鉴于手动标签的可用性有限,我们使用半监督的培训技术有效地使用未标记的数据来改善性能和位置限制,以进一步降低误报。对不同猕猴的看不见的方法的评估,产生了令人鼓舞的结果,真正的正速率约为0.90。我们方法的代码可从https://github.com/v-sundaresan/fiberbundle_seg_tracing获得。
translated by 谷歌翻译
与非线性二次调节剂(NLQR)问题相关的汉密尔顿 - 雅各比 - 贝尔曼部分微分方程(HJB PDE)的近似的深度学习方法。首先使用了依赖于州的Riccati方程控制法来生成一个梯度调制的合成数据集,以进行监督学习。根据HJB PDE的残差,最小化损耗函数的最小化成为一个温暖的开始。监督学习和残留最小化的结合避免了虚假解决方案,并减轻了仅监督学习方法的数据效率低下。数值测试验证了所提出的方法的不同优势。
translated by 谷歌翻译
随着自主系统成为我们日常生活的一部分,确保其信任度至关重要。有许多用于证明可信赖性的技术。所有这些技术的共同点是需要阐明规格。在本文中,我们对规格进行了广泛的看法,专注于顶级要求,包括但不限于功能,安全性,安全性和其他非功能性属性。本文的主要贡献是对于与指定可信度相关的自主系统社区的一系列高级智力挑战。我们还描述了有关自主系统的许多应用程序域的独特规范挑战。
translated by 谷歌翻译